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1. Introduction 


Let (S, S) be a measurable space, and let X, Xi,, X n be a sequence of 
i.i.d. random variables taking values in (S', S) with a common distribution 
P. We assume that S' is a separable metric space and S is its Borel a-field. 
Let F be a class of measurable functions / : S —>■ M with a measurable 
envelope F : S —> M satisfying F(x) > supy eJ - |/(x)| for all x e S'. Define 
the empirical process indexed by J 7 : 



where Pf = f fdP = E[/(A")]. Let ei, ...,e n be independent standard 
Gaussian random variables independent of X™ := {W,..., X n }. Define the 
multiplier bootstrap process indexed by F: 



( 1 ) 


where P n is the empirical measure with respect to Ad,, X n \ that is, P n f = 
n~ l f(Xi) for / G F. Let Ni, ..., N n be a sequence of random variables 
multinomially distributed with parameters n and (probabilities) 1/n,... ,1/n 
that are independent of A". Define the empirical bootstrap process indexed 
by A: 



Suppose that F C F 2 (P) is a VC type class of functions (the definition 


of VC type classes is recalled in Section [2]) with sup f £:F \Pf\ < oo. Then F 
is totally bounded with respect to the semimetric 


ep(f,9) = V p (f ~9) 2 , f,9 e A, 


and there exists a centered Gaussian process Gp indexed by F with uniformly 
ep-continuous sample paths and covariance function 


E [Gp(f)G P (g)] = Cov(f(X),g(X)), f j9 eF. 


( 2 ) 
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In this paper, for a given functional B : T —> M, we are interested in 
constructing couplings for 


sup (B(f) + G n f) 
fer 

and 

Z = sup(B(f) + G P f), 
ft? 

(3) 

sup (B(f) + G e J) 

feT 

and 

Z‘ d '= sup (B(f)+G P f), 

feT 

(4) 

sup (B(f) + G*J) 
fer 

and 

Z* d] = sup (B(f) + G P f ), 

f&F 

(5) 


such that the random variables appearing in each line are close to each other 
with high probability. The notation = means equality in distribution, and 

d\X.n 

= means equality in conditional distribution given X™ = {Xi,..., X n }. 
Here we suppose that the probability space is such that 


(RAP) = (S”,S",P") x (T.T.Q) x ([0,1],B([0,1]),A) 


where X u , X n are the coordinate projections of ( S n ,S n ,P n ), random 
variables ei,... ,e n (or Ni,..., N n ) depend on the “second” coordinate only, 
and ([0,1],8([0,1]),A) is the Lebesgue probability space on [0,1], that is, 
£>([0,1]) is the Borel er-field on [0,1] and A is the Lebesgue measure on [0,1]. 
The last augmentation of the probability space enables us to generate a 
uniform random variable on [0,1] independent of Xi, ..., X n and ei,... ,e n 
(or N u ... ,N n ). We also implicitly assume here that the functional B and 
the class X are “nice” enough so that measurability problems do not arise; 
see Section [2] for explicit assumptions. 

Our coupling constructions are based on the Slepian-Stein methods and 
Gaussian comparison inequalities and built on the ideas in 0 , 0 0 0 0 0 
0. We emphasize that the construction of couplings in this paper is non- 
asymptotic, and so the class of functions T = T n may depend on n, and 
its complexity may grow as the sample size increases. This feature of the 
couplings is especially important in modern nonparametric statistics 14]; 
see [6] and [7] for examples of applications. 

We also emphasize that our couplings are not of the Hungarian type, and 
so are different from those obtained in e.g. [l0 and 23], among many others. 
In particular, in contrast to e.g. [ 23 }, our couplings do not depend on the 
maximal total variation in T. Instead, the couplings only depend on VC 
properties of the class of functions T as well as on certain moments of the 
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functions in T and the envelope F. This feature of the construction leads 
to a different range of possible applications in comparison with Hungarian 
couplings; see the detailed discussion in (©]. 

Gaussian and bootstrap approximations of the supremum of a non-centered 
empirical process have many potential applications. For example, these ap¬ 
proximations can be used to derive non-asymptotic bounds on the errors in 
multivariate CLT. Specifically, let S = M p , and let A be a closed convex 
set in S. For V p_1 = {u G W : ||u|| = 1}, let Va : V p_1 —> M be the 
support function of A defined by Va(v) = swp xeA v T x. Then x G A if and 
only if sup„ gVP -i (v T x — Va{v)) < 0. Therefore, our results can be used to 
approximate 


P 


^ x ' eA ) = 


i =1 


sup 

uGVP - 1 


_ y 

\fn z -' 


v T X i 


V A (v)) <0 


2=1 


( 6 ) 


Here, the dimension p = p n of the sample space S = R p can depend on the 
sample size n and increase as n grows. Importantly, if the set A is such that 
the set V p_1 on the right-hand side of (j6j) can be reduced to a sufficiently 
small subset of V p_1 , the Gaussian approximation becomes possible even 
if p is larger or much larger than n\ see [5j and [9] for examples. More 
broadly, one can use our results for distributional approximation of general 
convex functionals on M p where the probability measure on M p is given by 
the distribution of a normalized sum of i.i.d. random vectors; see Section 11 


of 10] where it is demonstrated that such functionals can be represented as 


suprema of non-centered empirical processes. 

Another possible application is to study power properties of nonpara- 
metric tests where under the null, the statistic can be approximated by 
sup/ e j- G n /, and under the alternative, the statistic can be approximated by 
sup + G n f), the functional B representing deviations from the null 
hypothesis. Finally, non-centered empirical processes are useful fore multi¬ 
scale testing where one combines many statistics corresponding to different 
scales into one test using scale-dependent critical value for each statistic; see 


13] where such tests were used for qualitative hypotheses testing. 

This paper builds upon but differs from our previous papers 0, B 0 S 
In particular, this paper establishes, in the infinite dimensional setting, 


formal results on the multiplier and empirical bootstraps when the envelope 
F may be unbounded. In addition, this paper allows to approximate the 
supremum of a possibly non-centered empirical process. These settings are 


not covered in our previous papers 0, 0 


and are new. 
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The organization of this paper is as follows. In the next section, we 
present our main coupling theorems. In Section [31 we derive two auxiliary 
theorems that deal with maxima of high-dimensional random vectors. All 
the proofs are deferred to Sections [4] and [5j For convenience of the reader, 
we cite some additional results that are useful in our derivations in Section 

El 


1.1. Notation 

We use standard notation from the empirical process literature. For any 
probability measure Q on a measurable space (S,S), we use the notation 
Qf = f fdQ. For p > 1, we use C P (Q) to denote the space of all measurable 
functions / : S —» R such that ||/||q )P = (Q|/| p ) 1//p < oo. We define the 
(semi)metric e Q on C 2 (Q) by e Q (f,g ) = \\f - g ||q, 2 , f,g E C 2 {Q). 

For e > 0, an £-net of a (semi)metric space (T, d ) is a subset T e of T 
such that for every t E T there exists a point t £ E T e with d(t, t e ) < e. The 
£-covering number N (T, d, e) of T is the inhmum of the cardinality of £-nets 
of T, that is, N(T,d,e) = inf{Card(T e ) : T £ is an £-net of T}. For a subset 
A of a semimetric space (T, d), we use A 6 to denote the ^-enlargement of A, 
that is, A 6 = {x E T : d(x,A ) < 5} where d(x,A) = inf ye Ad(x,y). We also 
use the notation || • ||r = sup teT || • ||. 

For a function g : M —> M, we write ||g||oo = sup xeR \g(x)\, and assuming 
that g is differentiable, we use g' to denote the derivative of g. We denote by 
C fc (U) the space of fc-times continuously differentiable functions on M. For 
a, b E R, we use the notation a V b = max{a, b}. 

2. Main results 

In this section, we construct couplings between random variables in (J3]) , 
(J4]) , and ([5]) when T is a VC type class of functions. Recall the definition: 

Definition 2.1 (VC type class). Let T be a class of measurable functions 
on a measurable space (S', 5), to which a measurable envelope F is attached. 
We say that T is VC type with envelope F if there are constants A, v > 0 
such that supg ^(J 7 , eg, £||F||q i2 ) < ( A/e) v for all 0 < £ < 1, where the 
supremum is taken over all finitely discrete probability measures on (S,S). 

Let B : T —> M be a given functional, and for g > 0, let Nb(i]) be the 
minimal integer N such that there exist /i, ..., /jv £ F with the property 
that for every f E F, there exists 1 < j < N with | B(f) — B(fj)\ < g. We 
make the following assumptions. 
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(A) There exists a countable subset Q of T such that for any / e J, there 
exists a sequence g m G Q with g m —$■ / pointwise and B(g m ) —* B(f). 

(B) The class of functions F is VC type with a measurable envelope F and 
constants A > e and v > 1. 

(C) There exist constants b > a > 0 and q G [4, oo) such that sup f eJ r P\f\ k < 
a 2 b k ~ 2 for k = 2,3,4, and ||F||p i9 < b. 

Assumptions (B) and (C) guarantee that F is totally bounded with re¬ 
spect to the semimetric ep, and there exists a centered Gaussian process 
Gp indexed by T with uniformly ep-continuous sample paths and covariance 
function given in (]2j) . 

Pick any rj > 0 and put 

K n = K n (v, A, b, cr, B , 77 ) = log N B (r]) + v (log n V log (Ab/a)). 

The following theorem provides a coupling for Z and Z. 

Theorem 2.1 (Coupling for the supremum of the empirical process). Sup¬ 
pose that assumptions (A)-(C) are satisfied, and in addition suppose that 
< n. Let Z = sup f e p(B(f) + G n f). Then for every 7 G (0,1), there 

exists a random variable Z = sup f & p{B(f) + Gpf) such that 
P{|Z -Z\>Cfi V + ^ 1} )} < C 2 ( 7 + n- 1 ) 


where Ci, C 2 are positive constants that depend only on q, and 


^ («, A , b, T q, B , 77 , 7 ) = 


bKn 


{bo 2 Kl) lA 

7 l / q n l /2- l/q 1 71/877,1/6 ' 


+ 


(7) 


The result in Theorem 12.11 is new because it allows for non-centered pro¬ 
cesses. In addition, even in the case of centered processes, i.e. when B = 0, 
the bound here improves slightly on our previous result given in Corollary 
2.2 of 


Remark 2.1 (Comparison with Beck’s [l|] lower bounds). Suppose that T is 
the class of indicators of closed balls in W l , and X\, X 2 ,... are i.i.d. uniform 
random variables on [0, l] d . Then 18] proved, via KMT constructions, that 
there exist versions B n of Gp such that 


||G n - B n \\p := sup |G n / - B n f | = 0{n 1/(2d) (log nf /2 } a.s ., (8) 

/e^ 
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and up to a possible power of logn, this rate is best possible when d > 2 
(Beck [if, Theorem 2). 

We shall apply Theorem 12.11 to this class of functions. In this example, 
B = 0, and so Npfq) = 1. Since the class of closed balls in is a VC class 
with index d + 2 [see llj], assumption (B) is satisfied with F = l,v — cd with 
some universal constant c, and A being some universal constant. In addition, 
assumption (C) is satisfied with a = b = 1 and arbitrary q E [4, oo), so there 
is a universal constant d such that 


<#> < c'ly-^dn-W^logn + y-^d^n-^aogn) 2 / 3 }. 

If we take 7 = —* 0 sufficiently slowly, say = (logn) _1//2 , then for 

Z n = sup f eJ rG n f } Theorem 12.11 implies that there exists a sequence Z n of 
random variables with Z n = sup^g^r Gpf such that 

\Z n - Z n | = o ¥ {dn- 1/2+1/q (logn) 1+1/i2q) + d 2 / 3 n- 1 / 6 (logn) 5/6 }. (9) 

This holds even when d = d n —* 00 as long as dlogn = o(n 1//3 ) (which 
guarantees A' 3 < n) , and the right-hand side on (JUJ) is op(l) if d(log?r ) 5 / 4 = 
0(n 1//4 ) by setting q large enough. It is then clear that, although Theorem 
IQ is only applicable to the supremum, and there is a difference in the mode 
of convergence, the rate of approximation of our coupling in flU]) is better 
than that implied by (JHJ) when d is large. ■ 

Next we provide a coupling for Z e and Z e . 

Theorem 2.2 (Coupling for the supremum of the multiplier bootstrap pro¬ 
cess). Suppose that assumptions (A)-(C) are satisfied, and in addition sup¬ 
pose that K n < n. Let Z e = sup/ e j-(A(/) + G®/). Then for every 7 E (0,1), 

there exists a random variable Z e = I sup+ Gpf ) such that 
P{|Z e - Z e | > C 3 ( V + 5^)} < C 4 ( 7 + n' 1 ), 


where C 3 , C 4 are positive constants that depend only on q, and 


5 { n = 4 2) (ffi A b , a, q, B, rj, 7 ) = 


bK„ 


ryl+l/q n l/ 2 -l/q 

Finally, we provide a coupling for Z* and Z*. 


+ 


( baK 3 J 2 ) 1 / 2 

<yl+l/977,l/4 
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Theorem 2.3 (Coupling for the supremum of the empirical bootstrap pro¬ 
cess). Suppose that assumptions (A)-(C) are satisfied, and in addition sup¬ 
pose that K\ < n. Let Z* = sup + G*/). Then for every 7 G (0,1), 


there exists a random variable Z* = L 


sup f eJ r(B(f) + Gpf ) such that 


n\Z* - Z*\ > C 5 (v + ^ 3) )} < C 6 (7 + n- 1 ), 


where C $, Cfi are positive constants that depend only on q, and 
<5i 3) = ^\v,A,b,a,q,B, 7p 7 ) 

bK n (. ba 2 KI) 1 /* {baK* /2 y/ 2 

ryl+l/q n l/ 2 -l/q + 7 l/ 3 n l /6 + ^l+l/g^l /4 ' 

Remark 2.2. By Markov’s inequality, the following inequality is directly 
deduced from Theorem 12.21 under the conditions of Theorem 12.21 for every 
a G (0,1), with probability at least 1 — a, we have 

F{\Z e - Z e \ > C 3 (r) + 8®) | Xf} < a- 1 C ' 4 (7 + n~ l ). 

Likewise, the following inequality is directly deduced from Theorem 12.31 un¬ 
der the conditions of Theorem 12.31 for every a G (0,1), with probability at 
least 1 — a, we have 

P{|Z* -Z *I > C 5 ( V + 8. i 3) ) I X?} < + n- 1 ). 


Remark 2.3. In applications to statistics, it is often more useful to have 
bounds on the Kolmogorov distance for the following pairs of distribution 
functions: F(Z < •) and P [Z < •); F(Z e < ■ | Xfi) and P [Z < •); and 
P (Z* < ■ | Xf) and F(Z < •). Once such bounds are obtained, we will have 
a bound on, say, the Kolmogorov distance between F(Z < •) and P (Z e < 
■ | Xf). By the following simple lemma, we see that to obtain such bounds 
from the coupling inequalities stated in Theorems I2.1H2.31 we need an anti¬ 
concentration inequality for Z, that is, an inequality bounding sup teR P(| Z — 
t\ < e) for e > 0 . 

Lemma 2.1. Let V, W be real-valued random variables such thatF(\V—W\ > 
77 ) < r 2 for some constants ri,r 2 > 0. Then we have 

sup |P(V < t) — P (W < t)| < supP(|IK — 1 | < r 4 ) + r 2 . 











The proof of this lemma is immediate and hence omitted. In the case 
where B(-) = 0, a useful anti-concentration inequality for Z is found in 
Lemma A.l in [ 6 j], which essentially follows from Theorem 3 in [ 8 |. Lemma 
A.l in H does not cover, however, non-centered Gaussian processes. There¬ 
fore, here we provide a new anti-concentration inequality that can be applied 
to non-centered Gaussian processes. The proof of the lemma can be found 
in Section [J 

Lemma 2.2. Let T be a non-empty set, and let £°°(T) be the set of all 
bounded functions on T endowed with the sup-norm. Let X(t),t € T be 
a possibly non-centered tight Gaussian random element in £°°(T) such that 
a 2 := inftgT Var(X(£)) > 0. Define d(s,t ) := ^/E[(X(t) — X(s)) 2 ], s,t 6T, 
and for 5 > 0, define <f>{5) := Efsup^g^ | X(t) — X(s)|], where T$ = {(s,t) : 
d(s,t ) < 5}. Then for every e > 0, 

supP(| sup Af(f) — x\ < e) 

xGM t£T 

< ini {2(1 /a)(e + <j>(5) + rh)( v / 2 log N(T, d,5) + 2) + e" r2/2 } . 


3. Auxiliary results for discretized processes 

This section states two auxiliary results for “discretized” processes that 
will be used to prove the theorems stated in Section [2j 

Theorem 3.1. Let Ad,..., X n be independent random vectors in ML (p >2) 
with finite absolute third moments, that is, E[|Ajj| 3 ] < 00 for all 1 < i < n 
and 1 < j < p. Define p,i = E[A ?: ] and X t = X t — p, i; 1 < i < n, and consider 
the statistic Z = maxi<j< p n -1 / 2 Y^=i Xij- Let Yi,...,Y n be independent 
random vectors in 1 R P with Yi iV(//j, MfXiXf]), and define Yi = Yi — fii, 
1 < i < n and Z = maxi <j< P n~ 1 / 2 Y^i=i Then for every 5 > 0 and every 
Borel subset A of M, we have 

P {Z eA)< P (Z e A c ? 5 ) + ■ {L n + M n , x {5) + M nX (5 )}, 

6 6 y/n 
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where C 7 , C$ are universal positive constants, and 


L n = max — E 
i <j<p n ^ 

1 n 

M b ,i(<S) = Te 

n 


l*«l 


2=1 


max | X VJ | 3 • 1 | max | X t3 \ > dy/n/logp 


M n>Y {6) = - VE 


2=1 


max IK,-1 3 • 1 < max \Y,A > d\fnl logo 
i<3<p' <j<p' 1 


Theorem 3.2. Let X = (Xi, ..., X P ) T a?zd Y = (Yi,..., Yp) T be random 
vectors in W (p > 2) with X ~ N(p,Y x ) arid Y ~ N(p, E y ). Let A = 
maxi<^fc<p |S^j. — where Y x k and Y x k denote the ( j,k)-th elements of 
Y x and E 5 ; respectively. Define Z = m&x\<j< p Xj and Z = ma ^-i<j< p Yj. 
Then for every 5 > 0 and every Borel subset A ofWL, 

P(Z e A) < P (z, e A 5 ) + C 9 <rVA logp, 


where C 9 > 0 is a universal constant. 


4. Proofs for Section [ 2 ] 

Recall the definition of K n : 


K n = K n (v, A, b, a, B, rj) = logN B (r]) + v(\ogn V log (Ab/cr)). 


f.l. Proof of Theorem \2.1\ 

The proof relies on the following form of Strassen’s theorem. 


Lemma 4.1 (Strassen’s theorem). Let p and v be Borel probability measures 
on M. Let e > 0 and 5 > 0. Suppose that p(A) < u(A s ) + £ for every Borel 
subset A ofM.. Let V be a random variable with distribution /j. Then there 
is a random variable W with distribution v such that P(|Y — W\ > S) < e. 


Proof of Lemma f.l. See Lemma 4.1 in 


Proof of Theorem \2. 11 By Strassen’s theorem, it is sufficient to prove that 
for every Borel subset A of M, 


P(Z e A) < P{Z e A Cl{r,+s ” ) )} + C 2 (7 + n~ l ), (10) 


10 














where Z = sup f &J r(B(f) + G P f). The rest of the proof is divided into several 
steps. In the following, C denotes a positive constant that depends only on 
q: the value of C may change from place to place. 

Step 1. The first step is to “discretize” the empirical and Gaussian 
processes. To this end, take 

£ = a/(bn 1/2 ), N = 2 • N(7, e P , eb) ■ N B {rj). 


Since N(7, e P , eb) < {AA/e) v by approximation of P by a finitely discrete 
probability measure and assumption (B), we have logiV < CK n . By defi¬ 
nition, there exist fi,..., f N G T such that for every / 6 J, there exists 
1 < j < N with ep(/, fj) < eb and | B{f) — B(fj) \ < rj. Note that under the 
present assumption, the Gaussian process Gp can be extended to the linear 
hull of J- in such a way that G P has linear sample paths [see |l2, Theorem 
3.1]. Hence letting T e {/ — g : /, g G J 7 , e P (f, g ) < eb}, we conclude that 


0 < sup (B(f) + G n f) - max (B(fj) + G n fj) < -q + ||G n ||p e , 

f&T ±<]<N 

0 < sup (B(f) + G P f) - max (B(fj) + G P fj) <g+ \\G P \\ Pe . 
fe T i<j<^ 


Step 2. Here we wish to show that 

P{||Gp|k > Cy/<r 2 K n /n} < 2 n~\ 


( 11 ) 


This follows from the Borell-Sudakov-Tsirel’son inequality [see 123, Propo¬ 
sition A.2.1] complemented with Dudley’s maximal inequality for Gaussian 
processes [see|25|, Corollary 2.2.8]. 

First, by the Borcll-Sudakov-Tsirel’son inequality, we have 

P{||G p ||p £ > E[||Gp||p £ ] + eb\J2\ogn} < 2n" 1 . 


Second, by Dudley’s maximal inequality together with the fact that ATJT, e P , r) < 
N 2 (B,e P ,r/2) < ( 8 Ab/r) 2v , we have 

E[||Gp||p £ ] < Ceby/v log( 8 Ab/e) < C^o 2 K n /n. 


Combining these inequalities, together with the fact that log n < K n , leads 
to the desired inequality. 
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Step 3. We wish to show that 

P{||Gnlk > CbKJi^n 1 / 2 - 1 ^)} < 7 . ( 12 ) 

Applying Lemma 16.11 with a = 7" 1 / 9 and t = 7 " 2 / 13, to we have with 
probability at least 1 — 7 , 

||Gn|k < Cfr-^EIWGM + (a £ + n-^WMcWq)^ + n- l / 2 \\M e \\ 2 r l/q }, 

where a £ := supy gJ - e (P / 2 ) 1//2 < eb = a/n 1 / 2 and M e := 2 maxi<j< n F(AL). 
Here ||M £ || 2 < ||M £ || q < 2 n l ^ q b. In addition, by Lemma [6.21 we have 

E[||G„lk] < CiaiKjn) 1 ' 2 + bKjn 1 / 2 ~ 1 ^} < CbK n /n^~ x ^. 

Combining these inequalities leads to (H2|) . 

Step 4. Let Z £ = max 1 <j<j V (P(/ 7 ) + G n fj) and Z e = max 1 < J < 7 v(P(/j) + 
Gp(fj)). Here we apply Theorem 13. II to show that whenever 

S > 2ccm -1 / 2 (log A r ) 3//2 • (logn) (13) 


for some universal constant c > 0, we have for every Borel subset A of M, 

~ / hrr 2 K 2 b q K q 1 \ 

F(z- e A) < p (z‘ € A«) + C (^ + + -) ' (1 4 ) 


Let Ad = ( fj(Xi) - Pfj)i<j< N , 1 < i < n, and let Y = ( G P fj) 1 <j<N- 
Then as Ad,..., X n are i.i.d., 


L n = max E[|Xy| 3 ] = su P E[|/(X) - Pf | 3 ] < 8supP|/| 3 < 8a 2 6, 
1 <l^ Ar /eJ- /eJ 7 


M n , x (5) = E 


max r |Wj| 3 • 1 < max \X Xj \ > byfnf log At 
1 1 —j — 


< 


1 <j<N 

log 9 " 3 N 
q -3 




E 


max LA 

i<j<jv 


Hi 


< 


2 9 6 9 log 9 " d At 

(5v ^) 9 “ 3 


To bound M nj y(S), let || • ||^ denote the Orlicz norm associated with the 
Young modulus ipi(x) = e x «l, that is, = inf {?j > 0 : E[-0i(|^|/ti)] < 1}. 
Then it is routine to verify that there exists a universal constant c > 0 such 
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that || max^xjv \Yj\ ||^ x < cay/\ogN . Hence, by Markov’s inequality, for 
every x > 0 , 


P ( max | Ya | > x 1 <2 exp 
y i <j<N J 1 


x 


CCT \/log N / 

Therefore, by Lemma [6.61 whenever 5 > 2ccm _ 1 / 2 (log 3 / 2 N) ■ (logn), 


M nX (S) = E 


max LT 3 • 1 < max 

1 <j<N J I l<j<N 


l 


Y, 


> 5y/n/ log At 


< 12(5 y/n/ log At + ca a/ log At) 3 exp 

< Cn~ 2 (5y/n/ log At) 3 . 


5y/E \ 
ca log 3/2 At/ 


Application of Theorem 13.11 with these bounds, together with the bound 
log At < CK n , leads to (TT4|) . 

Step 5. In the previous step, take 

; r , I bK„ ) 

j + 7 1/ S „1/2-1 /j j > 

where C > 0 is a large enough but universal constant. It is easy to check 
that for this choice of 5, (1T3]) holds under the condition AT 3 < n. Indeed, since 
q > 4, b > a, logn < K n and log At < CK n , we have 2ccr?i _1//2 (log 3 ^ 2 At) • 
(logn) < C'aKn 2 /n 4 ^ 9 < C"6 1 ' /3 a 2//3 At 2 ^ 3 /(y 1 / 3 n 1 / 6 ) < 5. Therefore, by Step 
3, we have for every Borel subset A of M, 

P(Z £ e A) < P (Z £ e A° 7& ) + C ( 7 + n" 1 ). 


The desired inequality (ITOj) thus follows from combining Steps 1-5. ■ 


4-2. Proof of Theorem \2.2\ 

The proof of Theorem 12.21 relies on a conditional version of Strassen’s 
theorem due to | 2 o| . 

Lemma 4.2. Let V be a real-valued random variable defined on a probability 
space (0,M, P), and let C be a countably generated sub a-field of A. Assume 
that there exists a uniform random variable on [0,1] independent ofCVa(V). 
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Let G(- | C) be a regular conditional distribution on the Borel g- field of R 
given C, and suppose that for some 8 > 0 and e > 0, 


E 


sup{P(y eA\C)~ G(A S | C)} 

A 


< e, 


where sup^ is taken over all Borel subsets A o/R. Then there exists a random 
variable W such that the conditional distribution of W given C coincides with 
G(- | C), and moreover P(\V — W\ >8) < e. 

Proof. See Theorem 4 in [2(|. ■ 

Proof of Theorem I 2.2l Here C denotes a positive constant that depends only 
on q\ the value of C may change from place to place. In addition, to ease the 
notation, we write a <b if a < Cb. By Lemma [4.21 since g(X™) is countably 
generated by the construction of the probability space (in particular, recall 
that we have assumed that S' is a separable metric space), it is sufficient to 
find an event E £ g(X™) such that P(£') > 1 — 7 — n -1 , and on this event, 
the inequality 

P(Z e £ A | Xf) < P{Z £ A C(,1+s ™ 2)) } + C( 7 + n" 1 ) (15) 


holds for every Borel subset A of M, where Z = sup^ eJ r(B(f) + Gpf ). 

We first specify such an event, and then show that on this event, TO 
holds for every Borel subset A of R. Applying Lemma 16.11 with a = 7 1//g 
and t = ( 7 / 2 )to J-, we have with probability at least 1 — 7 / 2 , 

11 Gr n 11 jr < 7 -1 / 9 E[||G||jt] + (or + n~ 1/2 \\M\\ q )^ llq + n~ 1/2 II M\\ 2 T 1,q , 

where M := maxi<j<„ F{Xf) satisfies \\M \\ 2 < \\M\\ q = (E[| A^l 9 ]) 1 / <? < n^ q b. 
In addition, by Lemma [6.21 

E[||G n y < oK ]! 2 + \\M\\ 2 K n n ~ 1 ' 2 < gK x J 2 + bK n n~ l / 2+1 / q . 


Hence with probability at least 1 — 7 / 2 , 

||G n ||^ < gK]! 2 ! 7 V« + 5AA/(7 1/ % 1/2 - 1/9 ). (16) 

Moreover, applying Lemma 16.11 again with a = 7 ~ 2 A and t = ( 7 / 2 ) _4 / g to 
the class T■ 8F {/• g : /, g £ J 7 }, we have with probability at least 1 — 7 / 2 , 

||G||^ < 7 _ 2 / ' 2 E[||G n ||jr. J -] + {a + n- 1,2 \\M 2 \\ q/2 )^- 2/q + ?r' 1 / 2 ||M 2 || 2 7 _2/,? , 
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where a 2 := sup /eJ r.jr Pf 2 < sup feT Pf 4 < b 2 cr 2 . In addition, ||M 2 || 2 < 
\\M 2 \\ q/2 = (E[|M | 9 ]) 2 / 9 < n 2 / q b 2 , and as shown in the proof of Corollary 2.2 
in Q, E[||G n ||^.jr] < baKn 2 + b 2 K n n~ 1 / 2+2 / q . Hence with probability at least 
1 - 7 / 2 , 

HGnll^ < baKl' 2 h 2 / q + b 2 K n /( 1 2 ' q n l ' 2 - 2 ' q ). (17) 

Finally, by Markov’s inequality, with probability at least 1 — n _1 , 

11^llR.,2 < « 1/2 ||f I|P,2. (18) 

Dehne E as the intersection of the events in (1T6j) . (HZ|), and (jT 8 |) . Then 
E G cr(X™) and P(A) > 1 — 7 — n~ l . The rest of the proof, which is 
divided into several steps, is devoted to proving (fT51) for each fixed Xi,, X n 
satisfying (IT 6 |) - (fT 8 j) . 

In the following, we use the notation introduced in Step 1 of the proof of 
Theorem 12.11 Then 


0 < sup(S(/) + G e J) - ™*Wi) + G nfj) < V + \\K\\t s , 

/G T 1<J<N 

0 < sup (B(f) + Gpf ) - max (H(/j) + G P fj ) < 7 + ||Gp|U- 

/G T 1<J<N 


(19) 

( 20 ) 


Step 1. By Step 2 of the proof of Theorem 12.11 we have 
P(||G P |k > C\Ja 2 K n /n) < 2 n~\ 


Step 2. Here we wish to show that on the event E, 

P{||GJIk > C{(te/ff 2 ) 1/2 /(7 1/ % 1/4 ) + bK n /( t 1/ % 1/2 - 1/ «)) I AT} < 2n-\ 

( 21 ) 

Fix any X p ... , X n satisfying (TTTJIl — (ITSh . Let us write (IF — T) 2 := {(/ — 
g) 2 '■ f, 9 F J 7 }. Then observe that 

o 2 n := sup P n f 2 < sup E [f(X) 2 } + n~ 1 / 2 \\G n \\ { jr_ T) 2 

< ( eb ) 2 + n- 1 / 2 \\GJjr.jr < a 2 fn + boK^ 2 /(^n 1 ' 2 ) + b 2 Kj ( 7 2 / V- 2/? ) 

< baK]l 2 l( 1 2 l q n x i 2 ) + 6 2 A^/(7 2 /% 1 - 2 /^), 


where in the second line, we used the inequality ||G n || ( yr __ F )2 = sup fg£j r |G„(/- 
g ) 2 1 < 4||G n || j.p. Now, note that conditional on A"/, G® is a centered Gaus¬ 
sian process, and E[(G ^f) 2 \ X\] < Pnf 2 < & 2 for all / G T e . Hence by the 
Borell-Sudakov-Tsirel’son inequality [see|25l Proposition A.2.1], 

P{||G'|U > E[||G;|| a I Y"] + ff„V 21 ogn | A?} < 2n 


-1 
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To bound E[||G, e J|j- e | X™], observe that 


v n \\Te A sup 

/€7e 


10 10 

—j= y~\ e.if(Xi) + sup —=y'a- P n f 

I*?, 


= : / + II. 


By Dudley’s maximal inequality [see (25), Corollary 2.2.8], together with the 
fact that N(X £ ,e Pn ,2T\\F\\p n}2 ) < N 2 (E, e Pn , r||F||p ni2 ) < ( A/t) 2v , we have 


E [I | X{ 


< 


crnVicr/n 1 / 2 ) 

V 1 + logiV(J r e ,ep„,r)dr 
Jo 

< (a n V (cr/n 1/2 )) \Jv log(2-n 1 / 2 74||F||p 7ii2 / a) < {<j n V {a/n 1/2 ))K}/ 2 . 


Meanwhile, since ||P n ||j- e < a n by Jensen’s inequality, we have 

1 


E[//|X 1 n ]<||P n || Ji -E 


E 

i= 1 


e* 


< 


& 71 . • 


Combining these inequalities leads to (!2TTh 

Step 3. Let Z e,e = max f) )+G® j )) and Z e = maxi<j< 7 v W S ) + 
G P fj ). We wish to show that on the event E, the inequality 


P(Z e,£ G A | X?) < P (Z e G A 5 ) + 


C 

J 


(baKn 2 ) 1 / 2 bK n 

7 l/q n l/4 + ^l/q n l/2-l/q 


holds for every 6 > 0 and every Borel subset A of M. Let 


A := max \{P n (fjfk) - (P„/j)(P„A)} - {P(fjh) - (P/j)(P/*)}I. 
and observe that 


I Pn(fjfk) ~ P(fjfk )| < n- 1 / 2 ||G n ||^, 

|(P n / i )(P n / fe ) - {Pfj){Pf k )\ < n-lGnllj,- ||G n ||^ + ( rn- 1 / 2 ||G n ||^. 
Hence as Jl n < n, it is not difficult to check that on the event E, 

A < baK^/tfW 2 ) + b 2 Kj^n 1 - 2 ^). 

The assertion of this step now follows from Theorem 13.21 (recall log A < K n ). 
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Step 4. Take 


= rf 2) = (feA-y 2 )'/ 2 bK„ 

n ryl+l/q n l/4 + ryl+l/q n l/2~l/q ' 

Then the desired inequality (TT5l) (with suitable C 3 , C 4 ) follows from combin¬ 
ing (1T9|) . (1201) . Steps 1 , 2 , and 3 with this choice of S. ■ 

4-3. Proof of Theorem, \2 ..91 

Here (7 denotes a positive constant depending only on g; C may change 
from place to place. In addition, to ease the notation, we write a < b 
if a < Cb. In the proof below, we find an event E E cr(7f”) such that 
P (E) > 1 — 7 — n -1 , and on this event, the inequality 

P (Z* E A | Xf) < P {Z e E A C ^ +5 ^ | Xf} + C (7 + n" 1 ) (22) 

holds for every Borel subset A of M where Z e = sup f e j?(B(f) + G®/). Com¬ 
bining this inequality with (TT5D . which is established in the proof of Theorem 
12.21 (and which holds on a possibly different event E' E cr( Xf) satisfying 
P( E ') > 1 — 7 — n _1 ), the proof is completed by applying Lemma [4.21 

We first specify the event E. We use the same notation as introduced in 
Step 1 of the proof of Theorem 12.11 Then 

0 < sup (B(f) + G*J) - max (B(fj) - G*Jj) < r, + ||<G* || Te , (23) 

/GJ T 1<J<N 

0 < sup (B(f) + GU) - max (B(fj) - G*fj) < 3 + ||G‘||*. (24) 

In addition, as in the proof of Theorem 12.21 with probability at least 1 — y/4, 
IIGJj, < + VC l /(7 I/ 9 n 1/2-1/9 ); (25) 

with probability at least 1 — 7 / 4 , 

l|G„ll^ < boK'fltfl’) + 6 2 A'„/( 7 2 A„V2-2/ S)i (26 ) 

and with probability at least 1 — 71 -1 , 

MfU < h 1/2 ||A||p, 2 . (27) 
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Here J~ • J~ = {f ■ g '■ f, g G J 7 }. Moreover, by the triangle inequality, 


^E l£(*i) - P nf )\ 3 < E l/iMl 


i=l 


1<7<JV 

i=l 


and applying Lemma [6751 we have with probability at least 1 — y/4, 


max 

l<j<N 


Y l/i(*i )| 3 < E 

2=1 


n 


max 
i <j<N 


V |/,(A ,)| 3 


+ 7- 3/ «||M s ||, /3 , 


where M := max!<j< n max^xjv \fj(X i )\ < max 1 < i < n F(X i ) is such that 
||M 3 || g /3 < n 3/fq b 3 . In addition, by Lemma 16.41 


E 


n 


max 

1<]<N 


Eitwi 3 


< na 2 b + E[M 3 ] log N < na 2 b + n' pq lfK n . 


Therefore, with probability at least 1 — y/4, 


inax V |/,(A',) - P n fj\ 3 /n < a% + 6 3 Jr„/( T 3/ V- 3 7 . 

1<J<N L ' 


i=1 


Finally, by Markov’s inequality, with probability at least 1 — y/4, 


(28) 


max max |.f,(A,) - P„/,| < max F( A',) < i'H. (29) 

l<z<n 1<J<N 1 <i<n 


Dehne E as the intersection of the events in (125]) - (129[) . Then E G a(X and 
P(if) > 1 — 7 — n~ l . In the rest of the proof, which is divided into several 
steps, we prove (J22 ]) for each hxed X h ... ,X n satisfying 

Step 1. By Step 2 in the proof of Theorem 12.21 on the event E, 

P{||G«U > C'{(6aiL 3 / 2 ) 1 / 2 /( 7 1 /V/ 4 ) + bKj( 1 1 / q n 1 / 2 - 1 / q )} \ X J*} < 2 n~\ 

Step 2. Here we wish to show that on the event E , 

P{||G;||* > C{(boK 3 J*) 1 l*l( 1 1 l«n lli ) + bK n /( 1 'l'‘ n 1 l 2 - 1 l‘‘)} I A7} < n" 1 . 

(30) 

Note that conditional on X”, G* is the empirical process associated with 
n i.i.d. observations from the empirical distribution P n . When restricted to 
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the domain {Xi,... ,X n }, the class of functions T has a constant envelope 
maxi<j< n F(Xj) < 7 ~ 1 ^ q 7i 1 ^ q b. Moreover, by the same arguments as those 
used in Step 2 of the proof of Theorem 12.21 

<J 2 n := sup P n f 2 < baK^/^n 1 ' 2 ) + b 2 K n /('y 2 /«n 1 ~ 2 / q ). 

f&Fe 


Hence the inequality (1HU|) follows from application of Talagrand’s inequality 
(Lemma I6.3[) with t = log n. 

Step 3. Let Z*' e = maxi <j<N(B(fj)+G^fj) and Z e,£ = maxi <j<N(B(fj)+ 
G nfj)- Here we apply Theorem 13. II to show that whenever 


6 log At a(log 3 ^ 2 At) • (logn) 
71/977,1/2—1/9 77,1/2 


(31) 


for some sufficiently large C > 0, on the event E, the inequality 


P (Z*’ £ e A | X™) < P (Z e ’ £ e A 


cs 


(bo 2 K 2 b 3 K 3 

vn\ y—* f _ n j_ n _ 

- 1 ) + L ' ^ ( 5377 , 1/2 p 7 3/q n 3/2-3/q 


1 

H— 
n 


holds for every 5 > 0 and every Borel subset A of M. Let Xi = ( fj(Xi ) — 
Pnfj)\<j<N , 1 <i <n, and let Y = (G nfj)i<j<N- Then 

n 

L n = max J2 \Xv\ 3 / n £ ^b+^Kj^n 1 - 3 ^), 
l - J - N , =l 

M U: x(S) = n _1 ^rnax^ | X %3 | 3 • 1 < ^rriax^ X l3 | > Sy/n/ log N > = 0. 


The last equality follows from (l29|i since Sy/n/logN > C 7 l ^ q n l ^ q b. More¬ 
over, E[Y) 2 ] < P n f 2 < a 2 + n^ 1/,2 ||G n ||jr.jr for all 1 < j < At, and so by the 
same argument as that used in Step 4 of the proof of Theorem 12.11 we have 


M nX (5) 


E 


max I Yll 3 • 1 

1<j<V J 


< max IY)| > 5y/nl logN 
[i<j<jv' 


<n 2 (5y/n/ log At) 3 , 



since 6 > C(a 2 +n 1 P\\ ( Gin\\r-J : ) l ^ 2 n V 2 (log 3 / 2 At)-(logn) for sufficiently large 
C. The assertion of this step then follows from Theorem 13.11 
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Step 4. In the previous step, take 

, = r f + h JL 1 

\ 7 l/3 n l/6 ^ t 1/3+1/,„1/2-1/ 3 J 

where C > 0 is a large constant that can be chosen to depend only on q. It is 
not difficult to check that for this choice of 5 , (l3Tf holds under the condition 
A/ < n. The desired inequality (122]) then follows from combining (T23|) . (|24|) . 
Steps 1, 2, and 3 with this choice of 5. • 


4-4- Proof of Lemma \2.2 1 

We begin with proving the following lemma. 

Lemma 4.3. Let X = (Xi,..., X P ) T be a possibly non-centered Gaussian 
random vector with cr| := Var(Xj) > 0,1 < j < p. Then for every £ > 0, 

2^- 

supP(| max Xj — t\ < e) < —(a/2 logp + 2), 
tm 1 <i<p g_ 


where a = mini<j< p <jj. 


The lenmia^ follows from the following result due essentially to Nazarov 
2l[; see also 0. 


Lemma 4.4 (Nazarov’s inequality). Let W be a standard Gaussian random 
vector in R m , that is, W ~ X(0,J). Let A C R m be the intersection of p 
half-spaces (a half-space in R m is the set of form {tc G R m : a T w < t} for 
some a G R m with ||a|| = 1 and i Gl). Then 


lim ip(IT G X/kL) < ^ /21ogp + 2, 
< 54,0 0 


where the limit on the left-hand side exists. 


Proof of Lemma \JX\ It is clear that the distribution of ma xi<j<pXj is ab¬ 
solutely continuous, so let /(•) denote its density. Let IT be a standard 
Gaussian random vector in R p , and let p = (pi ,..., p p ) T = E[X] and 

£ = E[(X — p)(X — p) T ]. Then X = E 1//2 IT + p, so that denoting by 


T 

<7j a j 


(where aj G R p with ||a/| = 1) the 7 -t.l 1 row of E 1 / 2 , we obtain 


max ( Y}!' 2 W + p)j < t aJW < (t — pA/aj, 1 < Vj < p. 

1 <j<p J 
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Let A t = {w G M p : ajw < (t — Pj)/crj, 1 < Vj < p} for t G M, and observe 
that 

/0) = lim-P(VL G \A) a.e.teR. 

ej.0 £ 

Moreover, since 

A t+e C {w6f : ajw < (f - /ijO/cTj + e/a, 1 < Vj < p}, 
we have by Lemma [4.41 

-P (W G A t+e \A t ) < -P (W G A £ t ^\A t )+o( 1) < -( v / 21ogp+2)+o(l), e | 0, 
e e a 

which leads to f(t) < (l/q)(y/2 logp + 2) a.e. ■ 

We are now in position to prove Lemma 12.21 

Proof of Lemma\2fM Pick any 5 > 0, and let {iy,..., t tv} be a 5-net of (T, d) 
with N — N(T, d, S). Then 

|snpA"(t) — max X{tf)\ < sup \X(t) — X(s)| =: 

teT l<j<N {s,t)£T s 

so that for every x G M, e, r' > 0, 

P(| sup X(t) — x\ < e) < P(| sup X(tj) — x| < e + r') + P(£ > r'). 

teT i<j<iv 

By Lemma 14.31 the hrst term on the right-hand side is bounded by 

2(l/a)(e + r , )( v / 21ogiV + 2). 

On the other hand, by the Borell-Sudakov-Tsirehson inequality, for every 
r > 0, 

P(C > E[C] +rS) < e~ r2/2 . 

By taking r' = E[£] + rd = <p(S) + rS, we obtain the desired conclusion. ■ 
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5. Proofs for Section [3] 

We begin with proving the following lemma. 

Lemma 5.1. Let 8 > 0. For every Borel subset A of R, there exists a smooth 
function g : M — > R such that ||(/||oo < <5 —1 , Halloo < K8~ 2 , ||g /// || 00 < K8~ 3 , 
where K is an absolute constant, and 1 A (t) < g{t) < 1^3 s{t) for all t El. 

Proof. The proof is essentially similar to that of Lemma 18 in Chapter 10 of 
(jjj l with the exception that we employ a compactly supported smoother. Let 
p denote the Euclidean distance on M, and consider the function h(t) — (1 — 
p(t, A 6 )/S) + . Observe that h is a bounded Lipschitz function with Lipschitz 
constant 5. Let ip : M —>■ M be the function defined by (p(t) — C exp(l/(f 2 —1)) 
for |t| < 1 and ip(t) = 0 for \t\ > 1, where the constant C is chosen in such a 
way that f R p(t)dt = 1. Note that ip is infinitely differentiable with support 
[—1,1], Define g : M —>• R by 

g(t) = f h(t + 8z)tp(z)dz = <5 -1 f h(y)p>(5~\y -t))dy. 

M ]R 

Then it is routine to verify that g is infinitely differentiable and H^/Hoo < 
5 _1 , Halloo < K8~ 2 , Halloo < K8~ 3 . In addition, for t G A, h{t + 8z) = 1 if 
\z\ < 1, and <p(z) = 0 if \z\ > 1. Hence l^(f) < g(t). Meanwhile, for t f A 3S , 
h{t + 8z) = 0 if \z\ < 1, and <p{z) = 0 if \z\ > 1. Hence g(t) < 1^3«(t). ■ 

Proof of Theorem \3.1[ Here we write a < b if there exists a universal constant 
C > 0 such that a < Cb. Fix <5 > 0, and let /3 = 5 _1 logp. Since p > 2, we 
have 1/8 < fd. Let A be a Borel subset of PL Letting eg = fd~ l logp(= 5) and 
using Lemma 15.11 we can construct a smooth function g : R —> R such that 
Halloo < 5 _1 , IW'Woo < K8~ 2 , ||p'"||oo < K8~ 3 for some absolute constant 
K > 0, and l A ^p (i) < gift) < 1^+3 sift) for all t G M. In addition, let 
p = Y^i=i Fi and consider the function Fg : W —> M defined by Fg(x) = 
/S“ 1 log(Ei= 1 e^ + ^ ) ), x G M p . Then it is seen that maxi <j< p Xj < Fg(x — 
p) < ma xi<j<pXj + eg for all x € W. Hence 

P (Z eA)< P (Fg (n~ 1/2 J2LiXi) e Aep ) • (32) 

Next, let m = g o Fg. Then, as in the proof of Lemma 5.1 in [q| (see also 
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S3) , there exist functions Uj k i : M p —* M, 1 < j, k, l < p such that 

| djd k dim{x)\ < U jkl (x), Vx G M p , 

EJ lW =i U jkl {x) < (5- 3 + P6 - 2 + < ^r 1 , V* G M p , 

Ujkiix) < U jk i(x + y) < U jk i(x), Vx,y G M p with max \yj\ < ft 

i <j<p 


-l 


Hence proceeding as in Step 1 of the proof of Lemma 5.1 in |9j] and observing 
that the term cj(t)E[h(Z^ n \ 6)]dt in that paper is trivially bounded by a 
universal constant, one can show that for some universal constant c > 0, 


E 


m n 


,-V2 


Er=i x ' 


E 


m n 


,-!/2 


Y n Y 


< ■ {L n + M nX (cS) + M n Y (c5)} =: /, 

O^y/n 

which implies that for some universal constant C, 


W[Fp( n- 1 / 2 £”=i X *) G A ef> ) < E ml n~^t ± X t 


- 1 / 2 . 


m n 


V2 


E, n =i K *JJ + ci < P [Fp (■n- ir2 El,Y l ) g A e/3+3<5 j + Cl 

+ C7. 


< E 

< P yZ G 

Combining this inequality with (T32T) leads to the conclusion of the theorem. 


Proof of Theorem \3.f& Since p > 2, the assertion is trivial if A/6 2 > 1. 
Therefore, throughout the proof, we will assume that A /5 2 < 1. Let (3 > 0, 
and define Fp : M p —» M by Fp(x) = /3 _1 log(Ej=i e^ Xj+ll P) where x = 
(xi ,..., x p ) r and p = (pi ,. . ., /i p ) T . As in the proof of Theorem 13.11 it can 
be shown that for every g G C 2 (M), the function rn — g o Fp satisfies the 
inequality 

p 

^2 | djd k m{x)\ < \\g"\\ O0 + 2\\g'\\ oo p 

j,k=l 

for all x G M p . Hence using the same arguments as those used in the proof 
of Theorem 1 and Comment 1 in [g] with X and Y replaced by X — p and 
Y — n , respectively, we have 


E 


q ( max X, 


-E 


q ( max Y, 

\i<j<P J 


< ||/||ocA/2 + 2||/ 7 '|| ooV /2Alogp. 
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Now, take any Borel subset A of R. By Lemma 15.11 we can construct 
a function g G C 2 (R) such that ||(/||oo < 5 -1 and ||5' ,, ||oo < K8~ 2 for some 
absolute constant K , and 1 ^(i) < g(t) < 1^3s{t) for all tel. For this g and 
some absolute constant C, we have 


P ( max Ah- ed'l <E 


Vi <j<p 
< E 


g ( max X. 


i <j<p 

+ C(A<5 -2 + <5 _1 a/A logp) 


q max Y, 

Vi <j<p Y 

< Pf max Yj G A 35 } + C(AS~ 2 + S^y/A logp) 

Vi<i<p / 

P(" max Yj G A 35 )+C^/(A/6 2 )logp 

V1<?<P / 


< 


where the last line follows from the fact that A/S 2 < 1 and p > 2. The 
conclusion of the theorem follows from replacing S by <5/3. ■ 


6. Some technical tools 


Lemma 6.1. Let X x ,..., X n be i.i.d. random variables taking values in a 
measurable space (S,S) with common distribution P. Let T be a pointwise 
measurable class of functions f : S —> R, to which a measurable envelope F 
is attached. Consider the empirical process G n f = n -1 / 2 ^” =1 (/(X ?; ) — Pf), 
f G T. Let cr 2 > 0 be a constant such that sup Pf 2 < a 2 < ||F||p 2 . Let 
M = maxi<j< n F(Xi). Suppose that F G C q (P ) for some q > 2. Then for 
every t > 1, with probability > 1 — t~ q ^, 


||G n |b < (1 + a)E[||G n ||^] + K q {(a + n~ 1/2 \\M\\ q )Vi 

+ a _1 ?r^ 1 ^ 2 ||M|| 2 t|, Vo > 0, 


where K q > 0 is a constant that depends only q. 

Proof. The lemma is essentially due to [2], Theorem 12. See Theorem 5.1 in 
[6| for the version stated here. ■ 


Lemma 6.2. Consider the setting of Lemma \6.1\ In addition, suppose that 
there exist constants A > e and v > 1 such that supn N(F, eQ, £||F||q i2 ) < 
( A/e) v , 0 < £ < 1. Then 


E[||G k 


m y 


< K 


' vcr 


log 


^11*11 


P, 2 


cr 


+ 


v\\M\l 


n 


log 


41 * 11*2 


cr 
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where K is an absolute constant. 

Proof. See Corollary 5.1 in [^j. ■ 

Lemma 6.3 (Talagrand’s inequality). Consider the setting of Lemma 16'.21 
but suppose now that the envelope F is bounded by a constant b > 0, and let 
a 2 > 0 be a constant such that sup ^ T Pf 2 < a 2 < b 2 . If b 2 v \og(Ab/cr) < 
na 2 , then for every 0 < t < na 2 /b 2 , 


P {||<G n ||jr > Ko\J tM (v log(Ab/a )} < e f , 


where K is an absolute constant. 


Proof. This form of Talagrand’s inequality is taken from Theorem B.l in Q; 
the original references go back to 24], 19], and 15 


Lemma 6.4. Let Xi ,..., X n be independent random vectors in R p with p > 
2 such that Xij > 0 for all i = 1,... ,n and j = 1,... ,p. Define Z : = 
ma 'X-i<j< P Y^i=xXij and M maxi<j< n maxKjXpXy. Then 


E [Z] < K 


( max E[J2- =1 X i: 

\ 1 < 3 <P 


+ E [M] log p 


J 


where K is an absolute constant. 


Proof. See Lemma 9 in 


Lemma 6.5. Assume the setting of Lemma 6.f. Then for every rj > 0, s > 1 
and t > 0, 

P(Z > (1 + v)E[Z] +t)< KE[M s ]/t s , 
where K = K{jps) is a constant that depends only on r],s. 

Proof. See Lemma A.5 in [ 9 J. ■ 


Lemma 6.6. Let f be a nonnegative random variable such that P(£ > x) < 
Ae~ x /B for all x > 0 and for some constants A, B > 0. Then for every t > 0, 
E[£ 3 1{£ > t}] < 6 A(t + Bfe-EB. 

Proof. See Lemma A.8 in Q. ■ 
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